-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SplitVariants task in TasksGenotypeBatch.wdl to be compatible with downstream analysis #647
Conversation
# array and increments the counter for that array | ||
line = line.strip('\n').split('\t') | ||
line[4], line[5] = line[5], line[4] | ||
SVTYPE_FIELD = 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reassigning SVTYPE_FIELD here, you should either set SVTYPE_FIELD to 5 at the beginning or (my preference) move the code that swaps the fields to right before you append a new line to current_lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was addressed and the value was set to 5.
'ins': {'condition': lambda line: bca and line[SVTYPE_FIELD] == 'INS'} | ||
} | ||
|
||
current_lines = {prefix: [] for prefix in condition_prefixes.keys()} | ||
current_counts = {prefix: 0 for prefix in condition_prefixes.keys()} | ||
current_suffixes = {prefix: 'a' for prefix in condition_prefixes.keys()} | ||
|
||
# Open the bed file and process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the comments throughout the script to help document the code's functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More comments were added.
# Checks which condition and prefix the current line matches and appends it to the corresponding | ||
# array and increments the counter for that array | ||
line = line.strip('\n').split('\t') | ||
line[4], line[5] = line[5], line[4] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment explaining what it's doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment was added.
The SplitVariants task used to have some lines to switch columns 5 and 6 of the bed file output, which is read in downstream tasks of TrainRDGenotyping.GenotypePESR. This causes the TrainRDGenotyping.GenotypePESR to error out reporting.
Error: WARNING: Incorrect CNV type specified
1: stop("WARNING: Incorrect CNV type specified")
The python script splitvariants.py was modified to switch the columns to the appropriate order to be compatible with downstream analysis requirements.